The search functionality is under construction.

Author Search Result

[Author] Masao YANAGISAWA(78hit)

61-78hit(78hit)

  • A Highly-Adaptable and Small-Sized In-Field Power Analyzer for Low-Power IoT Devices

    Ryosuke KITAYAMA  Takashi TAKENAKA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2348-2362

    Power analysis for IoT devices is strongly required to protect attacks from malicious attackers. It is also very important to reduce power consumption itself of IoT devices. In this paper, we propose a highly-adaptable and small-sized in-field power analyzer for low-power IoT devices. The proposed power analyzer has the following advantages: (A) The proposed power analyzer realizes signal-averaging noise reduction with synchronization signal lines and thus it can reduce wide frequency range of noises; (B) The proposed power analyzer partitions a long-term power analysis process into several analysis segments and measures voltages and currents of each analysis segment by using small amount of data memories. By combining these analysis segments, we can obtain long-term analysis results; (C) The proposed power analyzer has two amplifiers that amplify current signals adaptively depending on their magnitude. Hence maximum readable current can be increased with keeping minimum readable current small enough. Since all of (A), (B) and (C) do not require complicated mechanisms nor circuits, the proposed power analyzer is implemented on just a 2.5cm×3.3cm board, which is the smallest size among the other existing power analyzers for IoT devices. We have measured power and energy consumption of the AES encryption process on the IoT device and demonstrated that the proposed power analyzer has only up to 1.17% measurement errors compared to a high-precision oscilloscope.

  • A Bit-Write-Reducing and Error-Correcting Code Generation Method by Clustering ECC Codewords for Non-Volatile Memories

    Tatsuro KOJO  Masashi TAWADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2398-2411

    Non-volatile memories are paid attention to as a promising alternative to memory design. Data stored in them still may be destructed due to crosstalk and radiation. We can restore the data by using error-correcting codes which require extra bits to correct bit errors. Further, non-volatile memories consume ten to hundred times more energy than normal memories in bit-writing. When we configure them using error-correcting codes, it is quite necessary to reduce writing bits. In this paper, we propose a method to generate a bit-write-reducing code with error-correcting ability. We first pick up an error-correcting code which can correct t-bit errors. We cluster its codeswords and generate a cluster graph satisfying the S-bit flip conditions. We assign a data to be written to each cluster. In other words, we generate one-to-many mapping from each data to the codewords in the cluster. We prove that, if the cluster graph is a complete graph, every data in a memory cell can be re-written into another data by flipping at most S bits keeping error-correcting ability to t bits. We further propose an efficient method to cluster error-correcting codewords. Experimental results show that the bit-write-reducing and error-correcting codes generated by our proposed method efficiently reduce energy consumption. This paper proposes the world-first theoretically near-optimal bit-write-reducing code with error-correcting ability based on the efficient coding theories.

  • A Retargetable Simulator Generator for DSP Processor Cores with Packed SIMD-type Instructions

    Nozomu TOGAWA  Kyosuke KASAHARA  Yuichiro MIYAOKA  Jinku CHOI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Simulation Accelerator

      Vol:
    E86-A No:12
      Page(s):
    3099-3109

    A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SIMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD sub-operations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.

  • Empirical Evaluation and Optimization of Hardware-Trojan Classification for Gate-Level Netlists Based on Multi-Layer Neural Networks

    Kento HASEGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    LETTER

      Vol:
    E101-A No:12
      Page(s):
    2320-2326

    Recently, it has been reported that malicious third-party IC vendors often insert hardware Trojans into their products. Especially in IC design step, malicious third-party vendors can easily insert hardware Trojans in their products and thus we have to detect them efficiently. In this paper, we propose a machine-learning-based hardware-Trojan detection method for gate-level netlists using multi-layer neural networks. First, we extract 11 Trojan-net feature values for each net in a netlist. After that, we classify the nets in an unknown netlist into a set of Trojan nets and that of normal nets using multi-layer neural networks. By experimentally optimizing the structure of multi-layer neural networks, we can obtain an average of 84.8% true positive rate and an average of 70.1% true negative rate while we can obtain 100% true positive rate in some of the benchmarks, which outperforms the existing methods in most of the cases.

  • CAM Processor Synthesis Based on Behavioral Descriptions

    Nozomu TOGAWA  Tatsuhiko WAKUI  Tatsuhiko YODEN  Makoto TERAJIMA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Co-design and High-level Synthesis

      Vol:
    E83-A No:12
      Page(s):
    2464-2473

    CAM (Content Addressable Memory) units are generally designed so that they can be applied to variety of application programs. However, if a particular application runs on CAM units, some functions in CAM units may be often used and other functions may never be used. We consider that appropriate design for CAM units is required depending on the requirements for a given application program. This paper proposes a CAM processor synthesis system based on behavioral descriptions. The input of the system is an application program written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and a binary code executed on it. Since the system determines functions in CAM units and synthesizes a CAM processor depending on the requirements of an application program, we expect that a synthesized CAM processor can execute the application program with small processor area and delay. Experimental results demonstrate its efficiency and effectiveness.

  • A Floorplan Aware High-Level Synthesis Algorithm with Body Biasing for Delay Variation Compensation

    Koki IGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1439-1451

    In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.

  • FPGA-Based Reconfigurable Adaptive FEC

    Kazunori SHIMIZU  Jumpei UCHIDA  Yuichiro MIYAOKA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-System Level Design

      Vol:
    E87-A No:12
      Page(s):
    3036-3046

    In this paper, we propose a reconfigurable adaptive FEC system. In adaptive FEC schemes, the error correction capability t is changed dynamically according to the communication channel condition. If a particular error correction capability t is given, we can implement an FEC decoder which is optimal for t by taking the number of operations into consideration. Thus, reconfiguring the optimal FEC decoder dynamically for each error correction capability allows us to maximize the throughput of each decoder within a limited hardware resource. Based on this concept, our reconfigurable adaptive FEC system can reduce the packet dropping rate more efficiently than conventional fixed hardware systems. We can improve data transmission throughput for a reliable transport protocol. Practical simulation results are also shown.

  • Scan-Based Attack against Trivium Stream Cipher Using Scan Signatures

    Mika FUJISHIRO  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E97-A No:7
      Page(s):
    1444-1451

    Trivium is a synchronous stream cipher using three shift registers. It is designed to have a simple structure and runs at high speed. A scan-based side-channel attack retrieves secret information using scan chains, one of design-for-test techniques. In this paper, a scan-based side-channel attack method against Trivium using scan signatures is proposed. In our method, we reconstruct a previous internal state in Trivium one by one from the internal state just when a ciphertext is generated. When we retrieve the internal state, we focus on a particular 1-bit position in a collection of scan chains and then we can attack Trivium even if the scan chain includes other registers than internal state registers in Trivium. Experimental results show that our proposed method successfully retrieves a plaintext from a ciphertext generated by Trivium.

  • An Effective Suspicious Timing-Error Prediction Circuit Insertion Algorithm Minimizing Area Overhead

    Shinnosuke YOSHIDA  Youhua SHI  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1406-1418

    As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring the middle points, or check points of several speed-paths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.

  • Scan-Based Attack on AES through Round Registers and Its Countermeasure

    Youhua SHI  Nozomu TOGAWA  Masao YANAGISAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E95-A No:12
      Page(s):
    2338-2346

    Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations.

  • Floorplan Driven Architecture and High-Level Synthesis Algorithm for Dynamic Multiple Supply Voltages

    Shin-ya ABE  Youhua SHI  Kimiyoshi USAMI  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E96-A No:12
      Page(s):
    2597-2611

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.

  • Scan-Based Side-Channel Attack on the LED Block Cipher Using Scan Signatures

    Mika FUJISHIRO  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E97-A No:12
      Page(s):
    2434-2442

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, a scan-based side-channel attack is reported which retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI chip and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 36 plaintexts on average if the scan chain is only connected to the LED block cipher. These experimental results also show the key is successfully retrieved even if the scan chain includes additional 130,000 1-bit data.

  • Efficient Multiplexer Networks for Field-Data Extractors and Their Evaluations

    Koki ITO  Kazushi KAWAMURA  Yutaka TAMIYA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E100-A No:4
      Page(s):
    1015-1028

    As seen in stream data processing, it is necessary to extract a particular data field from bulk data, where we can use a field-data extractor. Particularly, an (M,N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using multiplexers (MUXs). However, the number of required MUXs increases too much as the input/output byte widths increase. It is known that partitioning a MUX network leads to reducing the number of MUXs. In this paper, we firstly pick up a multi-layered MUX network, which is generated by repeatedly partitioning a MUX network into a collection of single-layered MUX networks. We show that the multi-layered MUX network is equivalent to the barrel shifter from which redundant MUXs and wires are removed, and we prove that the number of required MUXs becomes the smallest among MUX-network-partitioning based field-data extractors. Next, we propose a rotator-based MUX network for a field-data extractor, which is based on reading out a particular data in an input register to a rotator. The byte width of the rotator is the same as its output register and hence we no longer require any extra wires nor MUXs. By rotating the input data appropriately, we can finally have a right-ordered data into an output register. Experimental results show that a multi-layered MUX network reduces the number of required gates to construct a field-data extractor by up to 97.0% compared with the one using a naive approach and its delay becomes 1.8ns-2.3ns. A rotator-based MUX network with a control circuit also reduces the number of required gates to construct a field-data extractor by up to 97.3% compared with the one using a naive approach and its delay becomes 2.1ns-2.9ns.

  • X-Handling for Current X-Tolerant Compactors with More Unknowns and Maximal Compaction

    Youhua SHI  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Logic Synthesis, Test and Verfication

      Vol:
    E92-A No:12
      Page(s):
    3119-3127

    This paper presents a novel X-handling technique, which removes the effect of unknowns on compacted test response with maximal compaction ratio. The proposed method combines with the current X-tolerant compactors and inserts masking cells on scan paths to selectively mask X's. By doing this, the number of unknown responses in each scan-out cycle could be reduced to a reasonable level such that the target X-tolerant compactor would tolerate with guaranteed possible error detection. It guarantees no test loss due to the effect of X's, and achieves the maximal compaction that the target response compactor could provide as well. Moreover, because the masking cells are only inserted on the scan paths, it has no performance degradation of the designs. Experimental results demonstrate the effectiveness of the proposed method.

  • A Selective Scan Chain Reconfiguration through Run-Length Coding for Test Data Compression and Scan Power Reduction

    Youhua SHI  Shinji KIMURA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Test

      Vol:
    E87-A No:12
      Page(s):
    3208-3215

    Test data volume and power consumption for scan-based designs are two major concerns in system-on-a-chip testing. However, test set compaction by filling the don't-cares will invariably increase the scan-in power dissipation for scan testing, then the goals of test data reduction and low-power scan testing appear to be conflicted. Therefore, in this paper we present a selective scan chain reconfiguration method for test data compression and scan-in power reduction. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. After the scan chain reconfiguration a dictionary is built to indicate the run-length of each compatible class and only the scan-in data for each class should be transferred from the ATE to the CUT so as to reduce test data volume. Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach overcomes the limitations of traditional run-length coding techniques, and leads to highly reduced test data volume with significant power savings during scan testing in all cases.

  • High-Level Area/Delay/Power Estimation for Low Power System VLSIs with Gated Clocks

    Shinichi NODA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E85-A No:4
      Page(s):
    827-834

    At high-level synthesis for system VLSIs, their power consumption is efficiently reduced by applying gated clocks to them. Since using gated clocks causes the reduction of power consumption and the increase of area/delay, estimating trade-off between power and area/delay by applying gated clocks is very important. In this paper, we discuss the amount of variance of area, delay and power by applying gated clocks. We propose a simple gate-level circuit model and estimation equations. We vary parameters in our proposed circuit model, and evaluate power consumption by back-annotating gate-level simulation results to the original circuit. This paper also proposes a conditional expression for applying gated clocks. The expression shows whether or not we can reduce power consumption by applying gated clocks. We confirm the accuracy of proposed estimation equations by experiments.

  • A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches

    Masashi TAWADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1283-1292

    Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. Multi-core cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast dual-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multi-core cache configurations with different cache associativities. After that, we propose a new multi-core cache configuration simulation algorithm using our new data structure associated with new theorems. Experimental results demonstrate that our algorithm obtains exact simulation results but runs 20 times faster than a conventional approach.

  • Scan-Based Side-Channel Attack against RSA Cryptosystems Using Scan Signatures

    Ryuta NARA  Kei SATOH  Masao YANAGISAWA  Tatsuo OHTSUKI  Nozomu TOGAWA  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E93-A No:12
      Page(s):
    2481-2489

    Scan-based side-channel attacks retrieve a secret key in a cryptography circuit by analyzing scanned data. Since they must be considerable threats to a cryptosystem LSI, we have to protect cryptography circuits from them. RSA is one of the most important cryptography algorithms because it effectively realizes a public-key cryptography system. RSA is extensively used but conventional scan-based side-channel attacks cannot be applied to it because it has a complicated algorithm. This paper proposes a scan-based side-channel attack which enables us to retrieve a secret key in an RSA circuit. The proposed method is based on detecting intermediate values calculated in an RSA circuit. We focus on a 1-bit time-sequence which is specific to some intermediate values. By monitoring the 1-bit time-sequence in the scan path, we can find out the register position specific to the intermediate value and we can know whether this intermediate value is calculated or not in the target RSA circuit. We can retrieve a secret key one-bit by one-bit from MSB to LSB. The experimental results demonstrate that a 1,024-bit secret key used in the target RSA circuit can be retrieved using 30.2 input messages within 98.3 seconds and its 2,048-bit secret key can be retrieved using 34.4 input within 634.0 seconds.

61-78hit(78hit)